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t-h Abstract 

o 

A computable expression for the rate-distortion (RD) function proposed by Heegard and Berger has 
eluded information theory for nearly three decades. Heegard and Berger's single-letter achievability bound 
is well known to be optimal for physically degraded side information; however, it is not known whether 
the bound is optimal for arbitrarily correlated side information (general discrete memoryless sources). 
In this paper, we consider a new setup in which the side information at one receiver is conditionally 
less noisy than the side information at the other. The new setup includes degraded side information as 
q a special case, and it is motivated by the literature on degraded and less noisy broadcast channels. Our 

key contribution is a converse proving the optimality of Heegard and Berger's achievability bound in 
a new setting. The converse rests upon a certain single-letterization lemma, which we prove using an 
information theoretic telescoping identity recently presented by Kramer. We also generalise the above 
ideas to two different successive-refinement problems. 
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I. Introduction 

Wyner and Ziv's seminal 1976 paper flj extended rate-distortion (RD) theory to include side 
information at the receiver. Nearly a decade later, Heegard and Berger Q extended the problem setup 
of |T| to include multiple receivers with side information: an example of which, and the principal subject 
of this paper, is shown in Fig. [T] The RD function of this problem, however, has eluded complete 
characterisation in the sense that matching (computable pi p. 259]) achievability and converse bounds 
have yet to be obtained for general discrete memoryless source^] 

The best single-letter achievability bound for two receivers is due to Heegard and Berger |2] Thm. 2], 
and the best bound for three or more receivers is due to Timo, Chan and Grant [5, Thm. 2]. Both bounds 
hold for arbitrary discrete memoryless sources under average per-letter distortion constraints. Matching 
converses have been obtained for some special cases, with each proof being constructed on a case by 
case basis, e.g., j2j, (6[-|[8j. A special case of note is when the side information is physically degraded 
in the sense that the side information at one receiver is a noisy version of the side information at the 
other. Heegard and Berger exploited this degraded stochastic structure in their converse [2, pp. 733-734] 
to prove the optimality of their achievability bound. 

In this paper, we consider a new setup in which the side information at one receiver is conditionally 
less noisy than the side information at the other. The setup includes physically degraded side information 
as a special case, and it is motivated by similar, but apparently unrelated, literature on degraded and less 
noisy broadcast channels (9j, |10|. Our key contribution is a new converse that proves the optimality 



of Heegard and Berger's achievability bound in a new setting (conditionally less noisy sources with a 
deterministic-distortion function at one receiver). The converse rests upon a certain single -letterization 
lemma, which we prove using an information-theoretic telescoping identity recently presented by Kramer 
in (11] Sec. G]. 

Elements of Heegard-Berger's problem have appeared in many guises throughout the information theory 
literature. Special cases of the problem include the almost lossless setup of |6j, the complementary side 
information setup of ||7J, p2) , and the product side information setup of [ 8]. Generalisations of the problem 



include the Wyner-Ziv successive-refinement work of 1 13 1-| 15 ] and the joint source-channel coding setup 



of p6[-fT8|. Other variations of the problem have been investigate with causal side information |T9J, 



[20 1 and common reconstructions |2~T| . The converse methods presented in this paper may be applicable 



'Matsuta and Uyematsu |4| recently presented matching achievability and converse bounds for Heegard and Berger's RD 
function using an information-spectrum approach; these bounds, however, are not computable. 
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to these and other problems, particularly to those with existing results on physically degraded side 
information. Indeed, to conclude the paper, we apply our converse methods to obtain new results for two 
successive-refinement problems with side information. 

Paper Outline: The remainder of the paper is divided into three sections: Section [11] presents the single- 



letterization lemma that will be key to our main results (converses); Section III presents a new converse 



for Heegard and Berger's RD problem shown in Fig. [TJ and Section IV presents new converses for two 



successive-refinement problems with side information (physically degraded side information [13|, [14| 
and scalable side information (T5j). 

Notation: All random variables in this paper are discrete and finite and denoted by uppercase letters, 
e.g., X. The alphabet of a random variable is written in matching calligraphic font, e.g. X is the alphabet 
of X. The ra-fold Cartesian product of an alphabet is denoted by boldface font, e.g. X is the n-fold 
product of X. If a random vector (X, Y, Z) forms a Markov chain in the same order (X is conditionally 
independent of Z given Y), then we write X—o— Y—&- Z. The symbol © denotes modulo-two addition. 

II. A Lemma 

This section concerns a single-letterization (or, entropy-characterisation) problem: express the difference 
of two re-letter conditional mutual informations with a single-letter expression. The lemma in this section 
is used to prove our converse results. 

Consider a tuple of random variables (R, S±, S2, T, L) with an arbitrary joint distribution. Let 

(R, Si, S2,T, L) = Si 5 i, 52,i, Ti, Li), (R2, Sip, 52,2, T2, L2), ■ ■ ■ , (R n , 5i, n , 52, n , T n ,L n ) (1) 

denote an n-tuple of n independent and identically distributed (i.i.d.) tuples of (R, Si, S2,T, L). Further, 
suppose that J is jointly distributed with the n-tuple {R, Si, S2, T, L) and 

J — °— (R, L) — o— (Si, S2, T) (2) 

forms a Markov chain. Consider the following difference of n-letter conditional mutual informations: 

I(J;S 2 \L)-I(J;Si\L). (3) 

We wish to know whether this difference can be expressed in a single-letter form in the sense of Csiszar 
and Korner (3j p. 259]. The next lemma answers this question in the affirmative. 

Lemma 1: Let (J, R, Si, S 2 ,T, L) be defined as above. There exists an auxiliary random variable W, 
jointly distributed with (R, Si , S2, T, L) and with alphabet W, such that 

|W|<|ft||£|, (4) 
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/(J; S 2 \L) - I(J; S!\L) = n(l(W; S 2 \L) - I(W; S^L)) (5) 

and 

W — o— (R,L) — o— (Si,S2,T) (6) 
forms a Markov chain. If, in addition, L is a function of R, then the chain in (|6]) can be replaced by 

W^-R-*-(S 1 ,S 2 ,T) (7) 
and the cardinality bound in Q can be tightened to 

\w\ < \n\. (8) 

The proof of Lemma [TJ which is given in Appendix [Aj makes use of an information-theoretic 
telescoping identity recently presented by Kramer in fTT] Sec. G]. 

III. The Heegard-Berger Problem 

This section is devoted to Heegard and Berger's RD problem shown in Fig. [1] Finding a computable 
expression for this RD function is a classic, longstanding, open problem in information theory. The 
section is arranged as follows: we recall the RD function's operational definition in Section III-A| we 



review Heegard and Berger's existing results for degraded side information in Section III-B and we state 
our new results in Section IIII-CI 

A. Operational Definition of the RD Function 

Consider a tuple of random variables (X, Y\_, Y%) with an arbitrary joint distribution on X x x 3^2- 
Let (X ,Yi,Y2) denote a string of n-i.i.d. random vectors (X, Y\,Y2), and let X, 3^i, 3^2 denote the 
ra-fold Cartesian products of X, y± and 3^2 respectively. Consider the setup of Fig. [T] the Transmitter 
observes X, Receiver 1 observes Y% and Receiver 2 observes Y 2 . The string X is to be compressed by 
the Transmitter and reconstructed by both receivers using a block code. The RD function is the smallest 
rate at which X can be compressed, while allowing the receivers to reconstruct X to within specified 
average distortions. 

An n-block code for the setup shown in Fig. [T] consists of three (possibly stochastic) maps. We denote 
these maps by 

/ : X — > M (9) 

and 

g . : M x yj — ► Xj, j = 1,2, (10) 
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Fig. 1. Rate distortion with side information at two receivers. 



where M. is a finite index set with cardinality \M\ depending on n, Xj is the reconstruction alphabet 
of Receiver j and Xj its re-fold Cartesian product. The Transmitter sends M = f(X) and Receiver j 
reconstructs Xj = gj(M,Yj). 
Let 

Sj-.Xx Xj — ► [0, oo), j = 1, 2, (11) 

be bounded per-letter distortion functions. For simplicity, and without loss of generality, we assume 
that Si and 62 are normal |22j p. 185]; that is, for each x in Xj there exists some x in Xj such that 

5j (x, x) = 0. 

Definition 1: A rate R is said to be (Di,D2)-achievable if for each e > there exists an re-block 
code (/, g\,g2), for some sufficiently large blocklength n, satisfying 



R + e> 1 log|7W| 

n 



(12) 



and 



1 

D i + e>E-^<5 i (X i ,X i , i ), i = 1,2. (13) 
i=i 

Definition 2 (RD Function): 

R(D 1 ,D 2 ) = min{i? > : R is (D h D 2 ) -achievable}, L»i > 0, D 2 > 0. (14) 

B. Existing Results 

Computable single-letter (3j expressions for the RD function have been found in some special cases, 
see 0, Q, (SJ. The achievability proofs of these cases all follow from a result by Heegard and Berger |2], 
which we review in the next lemma. The converses, in contrast, are derived on a case-by-case basis. 
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Lemma 2 (Achievability): The RD function is bound from above by (2| Thm. 2] 

R(D 1 ,D 2 )< min { max {l(X; C\Y X ), I(X; C\Y 2 )} + I(X; A\C, Y t ) + I(X; B\C, Y 2 )\, (15) 

(A,B,C) < J 

where minimisation is taken over all auxiliary random variables (A,B,C), jointly distributed with the 
source (X,Y\, Y 2 ), such that the following is true: 

(i) the auxiliary random variables are conditionally independent of the side information given X, 

(A, B, C) —o— X —o— (Yi, Y 2 ); (16) 

(ii) the cardinalities of the alphabets of C, A and B are respectively bound by 

|C|<|Af|+3 (17a) 
\A\ < \C\\X\ + 1 (17b) 
\B\ < \C\\X\ + 1 (17c) 

(these cardinality bounds are new, see Appendix [B] for our proof); 

(iii) there exist deterministic maps 

01 : A x C x y 1 — ► Xi (18a) 
<P 2 : B x C x y 2 — »• X 2 (18b) 

with 

D 1 >ES 1 [X,<f>x(A,C,Y 1 )) (19a) 
D 2 >E5 2 (X,MB,C,Y 2 )). (19b) 

The next definition and theorem review a special case for which the upper bound of Lemma [2] is tight. 
Definition 3: The side information is said to be physically degraded if 

X^-Y 2 ^-Y 1 . (20) 

Theorem 3: If the side information is physically degraded, then J2j Thm. 3] 

R{D X ,D 2 )= min \l{X; C\Y X ) + I(X; B\C, Y 2 )\, (21) 

(-B,C) v J 

where the minimisation is taken over all auxiliary (B, C), jointly distributed with (X, Y x , Y 2 ), such that 
(i) the auxiliary random variables are conditionally independent of the side information given X, 

{B,C)^X^{Y 1 ,Y 2 ); (22) 
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(ii) there exist deterministic maps 

fa : C x y x — ► i x (23) 

<^ 2 : B x C x y 2 — > ^ 2 (24) 

with 

£>i>Etfi(Jf,^i(C,ri)) (25) 

D 2 >E ^(^^(fl.C.Fj)). (26) 



The Markov chain in (j20J), which defines physically degraded side information, enables a crucial step 
in Heegard and Berger's converse of Theorem |3j see ||2[ pp. 733-734]. The goal of the next section is 
to broaden the scope of Theorem [3] by replacing the Markov chain ( |20| ) with a more general condition. 



Our main results, however, will fall slightly short of this goal: we will need to restrict attention to the 
setting where Receiver 1 requires an almost lossless copy of a function of X. More specifically, we will 
require that D\ = and 5\ is deterministic in the following sense. 



Definition 4: 5\ is said to be deterministic [15], |23| if there is an alphabet X with X\ = X and a 
deterministic map 

ip : X — > X (27) 

such that 

. if x = ib(x) 
*i(M)= { (28) 
I 1 otherwise. 

For later discussions, we need to specialise Theorem [3] to deterministic 6±. Let 

X = ip(X). (29) 

Define 

S(D 2 ) ± m in I(X; B\X, Y 2 ), D 2 > 0, (30) 
B 

where the minimisation is taken over all auxiliary B, jointly distributed with (X,Yi,Y 2 ), such that 

(i) the auxiliary random variable B is conditionally independent of the side information (Yi, Y 2 ) given 
X, 

B^X^{Y X ,Y 2 ); (31) 

(ii) the cardinality of the alphabet of B is bound by 

|B|<|#| + 1; (32) 
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(iii) there exists deterministic 

4> 2 : B x X x y 2 — > X 2 (33) 

with 

D 2 >E5 2 (X,MB,X,Y 2 )). (34) 

The function S(D 2 ) is non-increasing, convex and continuous in D 2 , see |T[ Thm. A2]. The next corollary 
is proved in Appendix [E] 

Corollary 3.1: If the side information is physically degraded and 5± is deterministic, then 

R(0,D 2 ) = H(X\Y 1 ) + S(D 2 ). (35) 



It will be useful to further specialise Corollary 3.1 to the following two-source with component 
Hamming distortion functions. This specialisation is central to our understanding of how Corollary 3.1 
can be generalised. 

Definition 5: We say that {X,Y\,Y 2 ) is a two-source if 

^ = ^ix^ 2 and X=(X 1 ,X 2 ), (36) 

where X\ and X 2 are finite alphabets. In addition, we say that 5i and 5 2 are component Hamming 
distortion functions if 

Xj = Xj (37) 

and 

{0 if x = x 
(38) 
1 otherwise 

fori = 1,2. 

Corollary 3.2: Consider a two-source (Xi, X 2 , Y\, Y 2 ) with component Hamming distortion functions. 
If the side information is physically degraded, i.e., 

(X 1 ,X 2 )^Y 2 ^-Y 1 , (39) 

then @, (5J 

#(0,0) = H(X 1 \Y 1 ) + H(X 2 \X 1 ,Y 2 ). (40) 

The last corollary can be directly proved in a simple way that nicely adds motivation to the possibility 
of a more general converse. 
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Proof Outline ( Converse ): If R is achievable, then for each e > and sufficiently large n there exists 
an n-block code (f,gi,g 2 ) f° r which the following is true: 

R + e > -loglMl (41) 
n 

> -H(M) (42) 
n 

>-I{X 1 ,X 2 ,Y 1 ,Y 2 ;M) (43) 

n 

= i (/(Xx, Yr, M) + /(X 2 , Y 2 ; M\X X , Yi)) (44) 

> i (/(X i; M| Yx) + J(X 2 ; M|X l5 F 1; Y 2 )) (45) 

(a) 1 / \ 

> -^H{X 1 \Y 1 ) + H{X 2 \X 1 ,Y 1 ,Y 2 )-ns(n,e)J (46) 

® HiX^) + i?(X 2 |X l5 F 1; y 2 ) - e{n, e) (47) 

( = } H(X 1 \Yx) + H(X 2 \X 1 ,Y 2 ) - e(n, e) . (48) 

The justification for steps (a), (b) and (c) is as follows, 
(a) X\ and X 2 are determined by (M, Y"i) and (M, Y2) respectively, so (a) follows by Fano's 



inequality [10 Sec. 2.2]. Here the function e(n, e) can be chosen so that e(n, e) — > as e — )• 0. 

(b) (Xx^Yx,!^) isi.i.d. 

(c) The side information is physically degraded and consequently X 2 — »— (Xi, Y 2 ) — o— Yj. 

Proof outline (achievability): Suppose that we use the Slepian-Wolf / Cover random-binning argument 
to send X\ losslessly to Receiver 1 at rate R' close to H{X\\Yi). The side information is physically 
degraded, so we have 

R' > HiXtlYt) > fT(Xi|Y 2 ). (49) 



A close inspection of the random binning proof, e.g. flO[ , reveals that ( |49p also suffices for Receiver 2 
to reliably decode X\. Now, assuming X\ is successfully decoded by Receiver 2, we can send X 2 
to Receiver 2 at a rate i?" close to iJ(X 2 |Xi, Y 2 ) using (Xi, Y 2 ) as side information. The total rate 
R = R' + i?" is close to F(Xi|Yi) + H(X 2 \X U Y 2 ). ■ 



We notice that the Markov chain in ( [39] ) is equivalent to 

X 1 -o- Y 2 -o- Y x (50a) 

and 

X 2 -^-(Xi,Y 2 ) —Yx. (50b) 
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The chain ( |50a| ) is a sufficient, but not necessary, condition for the inequalities in (49) and hence the 



above achievability argument. In contrast, the chain ( |50b| ) is essential for equality (c) in ( |48| ) and hence the 
converse argument. The generality of the achievability argument juxtaposed against the more restrictive 
converse argument suggests that d40l) might hold for a broader class of two-sources. We show that this 



is indeed the case in the next subsection; specifically, we will see that (|4Q]> still holds when the Markov 



chain ( |50a[ ) is replaced by H(X\\Yi) > H(Xi\Y 2 ) and the chain ( 50b ) is replaced by a more general 
"conditionally less noisy" condition. 
Remark 1: 

(i) R(D\,D2) depends on the joint distribution of (X, Yi,Y 2 ) only via the marginal distributions of 

(X,^) and (X,Y 2 ). 

(ii) The side information is said to be stochastically degraded if the joint distribution of (X, Y\,Y 2 ) 
is such that there exists some physically degraded side information (X', Y{,Y£) with marginals 
(X',Y{) and (X',Y£) matching those of {X,Y\) and (X,Y 2 ). By Remark 1 (i), Theorem g] and 
Corollaries |3.1| and |3.2| also hold for stochastically degraded side information. 

(iii) The function S(D2), which is defined in ( f3"0"| ), is the Wyner-Ziv RD function JTJ Eqn. (15)] for a 
source X with side information (X, Y2). 

(iv) The asserted upper bound for R(Di, D 2 ) in |[2] Thm. 2] is incorrect for the case of three or more 
receivers (5j. 

C. New Results 

Suppose that L is an auxiliary random variable that is jointly distributed with the source (X,Yi,Y 2 ). 
Definition 6: We say that Y 2 is conditionally less noisy than Y\ given L, abbreviated as (Y 2 ^ Yi \ L), 

if 

I{W',Y 2 \L)>I{W;Y X \L) (51) 

holds for every auxiliary W, jointly distributed with {X,Y\,Y2,L), for which 

W^(X,L)^(Y U Y 2 ). (52) 

The next lemma and example collectively show that Definition [6] is broader than Definition [3] The 
lemma is proved in Appendix |Cj 
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Lemma 4: 

(i) If the side information (X,Yi,Y 2 ) is physically degraded and the auxiliary random variable L 
satisfies the Markov chain 

L -^>— X — o— (Yi ,Y 2 ), (53) 

then (Y 2 >z Y\ \ L). 

(ii) If a two-source (X±, X 2 , Y±, Y 2 ) satisfies 

X 2 -o-X l ^ Y l (54) 

and L = X lt then (Y 2 >z Y x \ Xi). 



The next example describes a two-source, where the side information is not degraded, but p4| ) holds 
and therefore (Y 2 h Y x \ X x ). 

Example 1: Let X 2 , Y 2 , and Z be independent Bernoulli random variables with 

p[X 2 = 0] = l-P[X 2 = l]=p, p € (0, 1/2), (55) 
P[Y 2 = 0] = 1 - P[Y 2 = 1] = q, q £ (0, 1/2), (56) 
P[Z = 0]=l-P[Z = l]=r, r€ (0,1/2). (57) 



Let 



and 



We have 



Xi = X 2 e r 2 (58) 



Y 1 = X 1 ® Z. (59) 



x 2 ^xi-^yi, (60) 



so assertion (ii) of Lemma [4] implies (Y 2 >z Y\ \ X 2 ). In contrast, (Xi, X2) is not conditionally independent 
of Y\ given Y2 and, therefore, the side information is not physically degraded. 

The next lemma gives a lower bound for the RD function. Its proof uses the single-letterization Lemma[T] 
and is the subject of Appendix [D] Our main result in this section, Theorem [6] follows directly thereafter. 

Lemma 5 (Converse): If 8\ is deterministic, then the following is true. 

(i) For arbitrarily distributed (X, Yi,Y 2 ), we have 

R(Q,D 2 ) > H(X\Yx) + S(D 2 ) + min {l(W;Y 2 \X) - (61) 
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where the minimisation is taken over all auxiliary W, jointly distributed with (X, Y\, Y 2 ), such that 

W^-X^-(Y U Y 2 ), (62) 

and 

\W\ < \X\. (63) 

(ii) If (X,Y 1 ,Y 2 ) satisfies (Y 2 t Y x \ X), then 

R{0,D 2 )>H(X\Y 1 ) + S(D 2 ). (64) 
It is worth highlighting that in the minimisation 

mm {I (W;Y 2 \X) - I{W;Y 1 \X)} (65) 



it is always possible to choose W to be constant and ( [65] ) must therefore be non-positive. Assertion (ii) of 



the lemma follows immediately from assertion (i) upon invoking Definition [6] with the auxiliary random 
variable L = X. 

The next theorem gives a single-letter expression for R(D X ,D 2 ) in a new setting, and it is the main 
result of this section. The theorem is a direct consequence of the achievability of Lemma [2] and the 
converse of Lemma [5] (ii). 

Theorem 6: If 5\ is deterministic, 

(Y 2 hYx\X) and H(X\Y{) > H(X\Y 2 ), (66) 

then 

R(0,D 2 ) = H(X\Y 1 ) + S(D 2 ). (67) 

Proof: The achievability of ( [67] ) follows from Lemma [2] where we set C = X and A = constant. 
The converse follows by Lemma [5] ■ 



The next corollary generalises Corollary 3.2 to the conditionally less noisy setting. 



Corollary 6.1: Consider a two-source and component Hamming distortion functions. If 

(Y 2 t Y x I Xi) and H{X X \Y X ) > H(Xx\Y 2 ), (68) 

then 

R(Q,Q) = H(X 1 \Y 1 ) + H(X 2 \X 1 ,Y 2 ). (69) 
Proof: In Theorem [6j we have X = X\ and 

S(0) = H(X 2 \X h Y 2 ). (70) 
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Example 2: Let X\ and Z be independent Bernoulli random variables with 



and 



F[Xx = 0] = P[Xi = l] = \ (VI) 



»[Z = 0] = 1 -F[Z = 1] = -. (72) 

3 



Let 

X 2 = Xi Z. (73) 

Let ^2 and Yi be the outcomes of passing X\ through a BEC(2/3) and a BSC(l/4) respectively, see 
Fig. [2j We have (Y 2 ^ Fi | Xi) from condition (ii) of Lemma |4] Moreover, 

ff(Xi|y 2 ) = I (74) 

is smaller than 

Fp^y) = ff 6 (l/4) « 0.8113, (75) 

where 

H\,(a) = — alog 2 a — (1 — a)log 2 (l — a) (76) 



is the binary entropy function; therefore, we may apply Corollary 6.1 to get 

22(0,0) = flb(l/4)+i?b(l/3). (77) 
We notice that since 2/3 > 2/4 the side information Y 2 and Y\ is not physically or stochastically degraded 



with respect to X\ |10[ p. 121], [24|, and hence with respect to X = (Xi,X 2 ) 
Remark 2: 



(i) Theorem [6] includes Corollary 3.1 for physically degraded side information as a special case, since 

X ^-Y 2 ^-Y 1 (78) 

and 

X -o-X -*-(y lt Y 2 ) (79) 

implies ([66j> by Lemma |4] (i) and the data processing lemma. 

(ii) It appears that our approach to proving Lemma [5] (ii) does not readily generalise to an arbitrary 
distortion function, 8\. An apparent difficulty follows from the use of a Wyner-Ziv style converse 
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Y 1/3 U Y 3/4 V 

►O ►O 




Fig. 2. Binary channels defining the side information in Example[2] (a) Binary Erasure Channel (BEC) with erasure probability 
2/3; and (b) Binary Symmetric Channel (BSC) with crossover probability 1/4. Y and X are Bernoulli (1/2) and (1/3), and 
X = Y®X. 



argument to construct the S(D2) term using (X,Y%) as side information. The argument needs 
(X, Y±) to be i.i.d. and, if S\ is arbitrary, this need not be the case, 
(iii) Theorem [6] employs the conditionally less noisy definition for the special case where L is a 
deterministic function of the source X. In this case, we can remove L from the Markov chain 



in (52 1. 



(iv) If L = 0, then Definition [6] reduces to the less noisy concept for information-theoretic security 
for source coding recently introduced by Villard and Piantanida [25]. Thus, our definition is more 
broad. In fact, in Example [T] and when the parameter r is sufficiently small (or large) compared to 
p so that 

H{X X \Y X ) < H(X 2 ), (80) 

the side information Y% is conditionally less noisy than Y\ given X-i, but it is not less noisy. To 
see this, select W = X\, so that 

I{W;Y l )=H{X l )-H(X l \Y l ) (81) 

and 

I(W;Y 2 ) = H{X X ) - H{X X \Y 2 ) (82) 
= H{X X ) - H(X 2 ). (83) 
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IV. Successive Refinement with Side Information 

The method used in Appendix [D] to prove Lemma [5] can, with appropriate modification, yield useful 
converses for various generalisations of Heegard and Berger's RD problem. In this section, we extend 
the setup of Fig. [T] to two different successive-refinement problems with receiver side information. 

A. Problem Formulation 

Consider a tuple of random variables (X, Yi, ^,^3) with an arbitrary joint distribution. Let (X,Y±, 
^2,^3) denote a string of n-i.i.d. random vectors (X, Yi, Y 2 , Y 3 ). A successive-refinement n-block code 
for the setup shown in Fig. [3] consists of four (possibly stochastic) maps 

/ : X — > Mi x M 2 x M 3 (84) 

and 

9i:Mx^i^ *i (85) 

g 2 ■ Mi x M 2 x y 2 — > X 2 (86) 

<? 3 : Mi x M 2 x M 3 x y 3 — »• X 3 , (87) 

where Mi, M 2 and M3 are finite sets. The Transmitter sends (Mi, M 2 , M3) = f(X) over the noiseless 
channels, as shown in Fig. [3] Receiver 1 reconstructs X\ = gi(Mi, Y±), Receiver 2 reconstructs X 2 = 
g 2 (Mi,M 2 ,Y 2 ) and Receiver 3 reconstructs X 3 = g 3 (M 1 ,M 2 ,M s ,Y 3 ). 

Definition 7: A rate tuple R 2 , R3) is said to be achievable with distortions (D\, D 2: D3) if for 
each e > there exists an n-block code (/, gi, g 2 , gz), for some sufficiently large blocklength n, satisfying 

Rj + e > -\og\MA (88) 
n 



1 71 

Dj + e > E- 5 i ( x i . ( 89 ) 



n 

8=1 



and 

fori = 1,2, 3. 

Definition 8 (RD Region): 

K(Dx, D 2 , D 3 ) = {(Ri,R 2 , R 3 ) achievable with distortions (D x , D 2 , D 3 )}, (90) 

for Di > 0, D 2 > and D 3 > 0. 
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Fig. 3. Three-stage successive refinement with side information at the receivers. 



B. Three Stages with Y% better than Y2 better than Y\ (abhinc X^>— l3^>— Y2^^ Y\) 

In this subsection, we assume that Receiver 3 obtains the best side information and Receiver 1 the 
worst. Tian and Diggavi fl4} modelled such a relation with physically degraded side information, i.e., 
X—o— Y3—0— Y2—0— Y\, and they derived the corresponding RD region. The goal here is to broaden 
their result to a conditionally less noisy setup. 

We will need the following achievable RD region that holds for arbitrarily distributed side information. 
The region is distilled from a more general achievability result in [51, see Appendix [F] 

Let TZ m (Di, D2, -D3) denote the set of all rate tuples (Ri, R2, R3) for which there exist auxiliary 
random variables (A\, A2, A3), jointly distributed with the source (X, Yi, Y2, Y3), such that the following 
is true: 

(i) the auxiliary random variables are conditionally independent of the side information given X, 

(A U A 2 ,A 3 ) -o-X -o- (Yi,Y 2 ,Y 3 ); (91) 

(ii) the cardinalities of the alphabets of A\, A2 and A3 are respectively bound b}|^] 

l^i I < |#|+6 (92a) 
|^2| < |*| \Ai\ +4 (92b) 



2 Reference [51 does not provide cardinality constraints. The bounds in \92) follow by the standard convex cover method. 
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|A 3 | < \X\ \Ai\ \A 2 \ + 1; (92c) 

(iii) there exist (deterministic) maps for each j = 1, 2, 3 

<t>3 ■ Aj x yj — > Xj (93a) 

with 

Dj > E Sj (X, 4>j (Aj ,Yj)); (94a) 

(iv) the rate tuple R 2 , Rs) satisfies 

Rl> J(X;Ai|Fi), (95a) 
Ri + R 2 > maxI(X;A 1 \Y j ) + I(X;A 2 \A 1 ,Y 2 ) (95b) 

j — 1 5 2 

R! + R 2 + R 3 > max 7(X; Ai|K) + max/(X; A 2 |Ai, K) 

j=l,2,3 i=2,3 J 

+ J(X;A 3 |A 1 ,A 2 ,y 3 ). (95c) 
Lemma 7: The rates in lZi n (Di, D 2 , D3) are all achievable; that is, 

K 1D (D 1 ,D 2 ,D 3 ) C n(Dt,D 2 , D 3 ). (96) 



The next theorem, which is due to Tian and Diggavi 1 14 1, shows that the entire RD region is subsumed 
by 1Z m (Di, D 2 , D 3 ) whenever the side information is physically degraded as in (97 1. 
Theorem 8: If the side information is physically degraded in the sense 

X^Y 3 ^Y 2 ^-Y U (97) 

then [14, Thm. 1] 



K in {D 1 ,D 2 ,D 3 ) = K(D 1 ,D 2 ,D 3 ). (98) 
Moreover, the rate constraints in 05\ simplify to 

Rx^IiX-A^Yi) (99a) 

Ri + R 2 >I(X;A 1 \Y 1 ) + I(X;A 2 \A 1 ,Y 2 ) (99b) 

fli + i? 2 + R 3 > I(X; A^Yx) + I(X; A 2 \A U Y 2 ) + I(X; A 3 \A U A 2 , Y 3 ), (99c) 



where A\, A 2 and A3 obey the cardinality constraints in ( [92] ), see also |14| Thm. 1]. 

The achievability part of Theorem [8] is given by Lemma [7J and the simplified rate constraints in (99 1 



follow from the Markov chain (|97j). The converse assertion was proved by Tian and Diggavi in [14 



App. I] and there, again, the Markov chain ( |97j ) enabled a crucial step. 
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We now consider Theorem [8] with conditionally less noisy side information and, as previously, 
deterministic distortion functions at Receivers 1 and 2. In particular, Receivers 1 and 2 wish to reconstruct 
almost losslessly 

Xi = t/>i(X) and X 2 = MX), (100) 
respectively, where ipi and ip2 are functions of the form 

Vy:*— i = 1,2. (101) 
Theorem [8] with deterministic S\ and 82, simplifies as follows. Define 

S'(D 3 ) ^xnmI(X-A 3 \X u X 2 ,Y 3 ), ^3 > 0, 

where the minimisation is taken over all auxiliary A3, jointly distributed with (X, Y\, Y2, 13), such that 
the following is true: 

(i) the auxiliary random variable is conditionally independent of the side information given X, 

A 3 — X -o- (Xi,Y 2 ,Y 3 ); (102) 

(ii) the cardinality of the alphabet of ^3 is bound by 

\A 3 \<\X\ + 1; (103) 

(iii) there exists a (deterministic) map 

<fe : A 3 x X x x X 2 x 3^3 — >• & (104) 

with 

D 3 >E8 3 (X,<h{A 3 ,X 1 ,X 2 ,Y 3 )). (105) 

Corollary 8.1: If the side information is physically degraded as in ( [97] ) and 8\ and 62 are deterministic, 
then 7^(0,0,i?3) is equal to the set of all rate tuples R2, R 3 ) satisfying 

Ri > (106a) 
Ri + R2> H^Yt) +H(X 2 \X 1 ,Y 2 ) (106b) 
R X + R 2 + R 3 > HiX^Yx) + H(X 2 \X U Y 2 ) + S'{D 3 ). (106c) 

Proof: The achievability part follows directly from Theorem [8] upon selecting the auxiliary random 
variables as A\ = X\ and A2 = X2 as well as recalling the definition of S'(D 3 ). The converse can be 
proved following arguments similar to those used in Appendix [E] and is omitted for brevity. ■ 
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The next lemma is a converse for arbitrarily distributed side information: it is a successive-refinement 
analogue of Lemma [5] Let lZ out (D 3 ) denote the set of all rate tuples R 2 , R 3 ) for which 

Ri > #(li|Yi) (107) 

Ri+R 2 > HiXtlY!) + H(X 2 \X 1: Y 2 ) + min \l(W; Y 2 \X X ) - I(W; Y^xA (108) 

W K J 

R 1 + R 2 + R 3 > lUX^Yx) + H(X 2 \X U Y 2 ) + S'(D 3 ) + min \l(W; Y 2 \X{) - I(W; Y^xA 

W K J 

+ min \l{W; Y 3 \X U X 2 ) - I(W; Y 2 \X U X 2 ) }, (109) 

where each minimisation is independently taken over all auxiliary W, jointly distributed with (X, Y\,Y 2 , 
Y 3 ), such that |W| < \X\ and W~o- X^>- (Y U Y 2 ,Y 3 ). 
Lemma 9 (Converse): If 8\ and S 2 are deterministic, then 

n out (D 3 )^K(0,0,D 3 ). (110) 

Our proof of Lemma [9] is quite similar to that of Lemma |5J and it is given in Appendix [G] The next 
theorem shows that the outer bound (converse) of Lemma [9] matches the inner bound (achievability) of 
Lemma [7] for a certain conditionally less noisy setting. 

Theorem 10: If 8\ and 5 2 are deterministic, 

(Y 2 y Yt I X X ) and (Y 3 h Y 2 \ X X ,X 2 ), (HI) 

as well as 

> max{ J ff(X 1 |y 2 ),^(l 1 |y 3 )}, (H2a) 
fl-(X 2 |Xi,y 2 ) > tf(X 2 |li,Y3), (H2b) 
then 7^.(0,0,1)3) is equal to the set of all rate tuples (R\,R 2 ,R 3 ) satisfying ( |106 l, i.e., 



Ri > HiX^Yi) (113a) 

Ri + R2> ff(Xi|ii) + ff(X 2 |Xi,y 2 ) (113b) 

Ri + R2 + R 3 > H(Xx\Y x ) + H(X 2 \Xx,Y 2 ) + S'(D 3 ). (113c) 

Proof: The converse follows directly by Lemma [9] and uses the conditionally less noisy assump- 



tions ( |111[ ). The achievability follows by Lemma [7] with A = X\ and B = X 2 and uses inequalities ( 1 12 1 



Remark 3: Steinberg and Merhav |T3| were the first to consider and solve the two-stage successive 



refinement problem with physically degraded side information. Tian and Diggavi's work [ 14 1 generalises 



Steinberg and Merhav 's result to three or more stages with physically degraded side information. 
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C. Two Stages with Y\ better than Y 2 (abhinc X—o— Y\— o— Y2) 

Reconsider the successive-refinement problem in Fig. [3] but now with only two receivers, Receiver 1 
and 2. Moreover, suppose that the side information at Receiver 1 is better than the side information at 
Receiver 2. Side information scalable source coding refers to the special case where 

X^-Y 1 ^y-Y 2 . (114) 

Notice that the roles of Y\ and Y 2 in ( 114 1 are reversed with respect to Definition [3] and Theorem [8] 
In contrast to Theorem [8] however, there is no known computable expression for the RD region in this 



setting. Tian and Diggavi give achievability and converse bounds in [ 15 1, and they show that these bounds 



match for degraded deterministic distortion measures. We wish to relax the Markov chain in \\ 14[ ) to a 
conditionally less noisy setting and yet still recover the special case results of Tian and Diggavi. 

The next lemma gives an achievable rate region for arbitrarily distributed side information. Like in 
Lemma [7] the rate constraints can be distilled from the rate constraints in |5], see Appendix [FJ and the 
cardinality bounds can be derived by the standard convex cover method. The lemma includes Tian and 
Diggavi's bound [15, Cor. 1] for arbitrarily distributed side information as a special case. 

Let TZ* n (D\, D 2 ) denote the set of all rate pairs (Ri,R 2 ) for which there exist auxiliary random 
variables {A\ 2 , A\, A 2 ), jointly distributed with the source (X, Y\,Y 2 ), such that the following is true: 

(i) there is a Markov chain, 

(A 12 ,A 1 ,A 2 ) -o-X + (Y U Y 2 ); (115) 

(ii) the cardinalities of the alphabets of A\ 2 , A\ and A 2 respectively satisfy 

|Ai 2 |<|Af|+3 (116) 
\M < \X\ \A X2 \ + \ (117) 
\A 2 \ < \X\ IA12I + I; (118) 

(iii) there exist deterministic maps for j = 1, 2, 

(f>j :Aj xyj ^ Xj, (119) 

with 

Dj > E 5j (X, 4>j (Aj ,Yj)); (120) 

(iv) the rate pair (Ri,R 2 ) satisfies 

R 1 >I(X;A 12 ,A 1 \Y 1 ) (121a) 
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R x + R 2 > max [l(X; A 12 \Y 1 ) , I (X ; A 12 \Y 2 )} + I(X; A x \A 12 , Y x ) + I(X; A 2 \A 12 , Y 2 ). (121b) 

Lemma 11: The rate pairs in lZ* n (Di, D 2 ) are all achievable; that is, 

n* a {D l ,D 2 )^n{D 1 ,D 2 ). (122) 

The next and final result of the paper generalises Tian and Diggavi's result p"5] Thm. 4], which holds 
under the Markov chain in ( 1 14[ ), to a conditionally less noisy setting. Suppose 5\ and 5 2 are deterministic, 



with X\ = ij)\{X) and X 2 = tp 2 (X). It is said that 5 2 is a degraded version of <5i if 

V>2 = ^ o (123) 

for some deterministic map i/j'. The next theorem is proved in Appendix |h| 
Theorem 12: Suppose that 5 X and 5 2 are deterministic, 
(i) If 5 2 is a degraded version of 5\, 

H{X 2 \Y X ) < H(X 2 \Y 2 ) and (Y x ^ Y 2 | X 2 ), (124) 

then 7e* n (0,0) = 71(0,0) and the rate constraints of ( [121) simplify to 



#i > (125a) 

i?i + -R 2 > i7(x 2 |y 2 ) + #(x 1 |x 2 ,ii). (i25b) 

(ii) If <5i is a degraded version of 5 2 and 

HiX^Yx) < H{X X \Y 2 ) (126) 
then 7e* n (0,0) = 72.(0,0) and the rate constraints of {121} simplify to 



Ri > H(Xi\Yi) (127a) 
Ri + R 2 >H(X 2 \Y 2 ). (127b) 

Appendix A 
Proof of LemmaQ] 

A. Preliminaries 

The proof will make use of the following telescoping identity. For any string of arbitrarily distributed 



random variables, (Ai,B x ), (A 2 ,B 2 ), . . ., (A n ,B n ), we have |11 Sec. G] 



= ^I(A\- 1 ;Bf), (128) 
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with the notational conventions 



A^Aj,A j+1 ,...,A k and B* = Bj, -B7+1, ... ,B k (129) 
for 1 < j < k < n as well as 

7(^;^ +1 ) = and I(A^; B%) 4 0. (130) 
These notations are used throughout the proof. 

We first prove ([5]). Notice that 

J(J; S 2 |L) - /(J; 5i|L) = /(J; 5 2 , L) - I(J; S u L), (131) 
by the chain rule for mutual information. Expand the first mutual information term I (J; S?,L) on the 



right hand side of d 1 3 1 [ ) as follows: 

n 

(a) 



/(J; S 2 , L) n I(J\ S 2 ,i, Li\S^, (132) 
i=l 

n 

^^/(j,^ 1 ,^- 1 ;^,^) (133) 

i=\ 
n 

( c ) \ ( j( 1 on oi— 1 ri—1 jn . q j \ 

— \ ' D 2,l ' -^1 ' ^i+lJ D 2,i ; ^ij 

i=l 

_ ^(^i+i) ^i+i! ^2,1, ii| J, L^ 1 )^ (134) 
£ (/(Wi;52 1 i ) L i )-/(^ lJ L^ 1 ;5 2)i ,L i |J,S*7 1 1 JJ L*rl)) (135) 



(d) 
i=l 

where (a) and (c) follow from the chain rule for mutual information; (b) exploits the fact that the source 
is i.i.d. and therefore 

H(S 2 ,i, HS^, L^ 1 ) = H(S 2 , l , Li); (136) 
and, finally, in (d) we define and substitute the random variable 

Wi 4 (J^l^S^ 1 ,^- 1 ,^). (137) 



Expand the second mutual information term I{J; S\,L) on the right hand side of < | 1 3 1 1 > using the 
telescoping identity ( |128| ) as follows: 

n 

I(J;S!,L) = ^2 ^2~i^i il _1 j i?) ~~ I(J>Sl jl ,L\;S™ i+1 ,L 1 { +1 )J (138) 

i=l 
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i=l 

— I(S2,i,Li;Si t i + i,L2 + i\J, ^i,L\ 1 )^j (139) 

n 

( c ) \ " ( j( j qn Qi— 1 ri— 1 rn . o r \ 

- J(52,i, L™ +1 | J, S^j 1 , Li -1 )) (140) 
" E -/^^L^^+^L^IJ,^ 1 ,^- 1 )), (141) 



i=l 

where (a) invokes the telescoping identity \Yl%\ and the chain rule for mutual information; (b) again uses 
the chain rule for mutual information; (c) exploits the i.i.d. source and hence 

H(S 1:i , Li\S[-^Ll l ) = H{S 1A , Li); (142) 

and, finally, in (d) we substitute Wi = (J, SJ Sjfi 1 , L^ 1 , L? +1 ). 
Subtract ( |141| ) from ( |135| ) to obtain 



/(J; S 2 , L) - I(J; Si,L) = J2 J (Wi; S 2ji , L*) - S M , A). (143) 



i=l 



We now single-letterize the quantity on the right hand side of ( |143| ). To this end, we introduce a time- 
sharing random variable: let Q be uniform on {1, 2, . . . , n} and independent of the tuple (R, Si, S 2 , T, 
L). Dividing ( |143| ) by n, we have 



n ( f^ I( - Wi] S2 ' h U) ~ I{Wt] Sl ' U (144) 
1 n 

- - s ^ L *\Q = i )~ W ; s ^ L ^ = *)) ( 145 ) 
1=1 

= /(Wq; S 2 ,q, Lq|Q) - I(W Q ; S hQ , L Q \Q) (146) 

( = } I(W Q , Q; S 2jQ ,L q ) - I(W Q , Q; S ltQ , L Q ) (147) 

^I(W;S 2 ,L)-I(W;Sx,L), (148) 

where in (a) we use that Q is independent of (Si,i, S2,i, Wi); in (b) that Q is uniformly distributed; 
in (c) that (Si, S 2 , L) is i.i.d. and independent of Q, and therefore 

H(S hQ ,L Q \Q) = H(S hQ ,L Q ); (149) 

and, finally, in (d) we define and substitute 

W = {W Q , Q), Si = S ltQ , S 2 = S 2 ,q, and L = L Q . (150) 
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From (143 1 and (148), we have 

I(J; S 2 ,L)- J( J; Si, L) = n(l(W; S 2 ,L)- I(W; S u L)) . (151) 

We also notice that 

Wi — (Ri, -o- (Su, S 2 ,i, TO, (152) 



forms a Markov chain for all i = 1,2, ... , n. Each of the n Markov chains in ( |152[ ) follows from the 
definition of Wi, the n-letter chain 

J (R, L) {Si, S 2 , T), (153) 
and the fact that (R, Si, S 2 ,T, L) is i.i.d. Now define 

R = R Q and T = T Q . (154) 
Using the independence of Q from (R, T, Si, S 2 , L), we have the desired Markov chain, 

W -o- (R, L) [S x , S 2 , T). (155) 

It remains to show that the auxiliary random variable W, whose alphabet cardinality is unbounded 
in n, can be replaced by some W with an alphabet satisfying Q. We now prove the existence of such 
using the convex cover method of, for example, [10. App. C]. 

For each and every w in the support set of W, let q^, denote the conditional distribution of (R, S\,S 2 , 
T, L) given W = w. Let V denote the set of all joint distributions on 1Z x S\ x ^2 x T x C. 

For each and every pair (r, I) in 1Z x C but one — the omitted pair, say (r*,P), can be chosen 
arbitrarily — define the functional g r \ : V — > [0,1], 

Si£Si s 2 £S 2 t£T 

The (\1Z\ \C\ — l)-functionals defined in ( |156| ) will be used to preserve the joint distribution of (R, S\,S 2 , 



T, L) when the Support Lemma 1 10 Sec. App. C] is invoked shortly. Indeed, we notice that for each 
such pair (r, /) the expectation 

E w {g r!l (q w )} = P l W = 9rA<l*) (157) 

is equal to the true probability F[(R, L) = (r, I)]. Moreover, this agreement extends over 1Z x S\ x S 2 x 
T x C because 

HdrMw)) ■W[S l = s 1 ,S 2 = s 2 ,T = t\R = r,L = l] (158) 
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is equal to the true joint probability P[R = r, Si = s±, S 2 = s 2 , T = t, L = I]. 

If the joint distribution of (R, L, Si, S 2 , T) is preserved, we can additionally preserve the difference 

I(W;S 2 ,L)-I(W;Si,L) (159) 

by simply preserving H(S 2 , L\W) — H(S±,L\W). To this end, define 

g(q)^H(S 2 ,L)-H(S 1 ,L), (160) 

where the joint distributiorj^] of (R, Si, S2, T, L) is understood to be given by q. We also notice that 

E w{9(lw)} = E P ^ = ^fe) (161) 

= H(S 2 ,L\W)-H{S 1 ,L\W). (162) 

The Support Lemma asserts that there exists an auxiliary random variable W defined on an alphabet 
W with cardinality 

\W\ < \K\\C\ 

and a collection of (conditional) joint distributions {q w } from V, indexed by the elements w of W, such 
that 

(i) for all (r, I) in 71 x C — excluding the omitted pair (r*, I*) — we have 

(ii) and 

E w {g(q w )}=E w {g(q w )}. (164) 

The new auxiliary random variable W and the distributions {q w } induce a joint distribution on W x 
1Z x C. The equality ( |163| ) ensures that the (R, L)-marginal of this new distribution is equal to the true 



distribution of (R,L). This agreement extends to the full joint distribution via ( |158| ); i.e., we impose the 
Markov chain 

W -o-(X,L) ^_(5i,5 2 ,r). (165) 
Finally, the equalities ( 163 1 and (164]) imply 



I(W; S 2 , L) - I(W; S 1} L) = I(W; S 2 , L) - I(W; S U L). (166) 
3 We use sans serif font to emphasise that this joint distribution differs to that of (R, Si, S^,T, L). 
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Remark 4: 

(i) A consequence of the telescoping identity ( |128| ) is the classic Csiszdr sum identity [10, Sec. 2.4], 

n n 

^IiAnB^MT 1 ) = J2m-A\- 1 \B? +1 ). (167) 



i=l 



The proof of Lemma [T] can be manipulated so as to replace the telescoping sum identity step ( | 141 



with a Csiszdr sum identity step. We feel that the telescoping approach gives a cleaner proof. 



(ii) We note that steps (a) and (b) of ( |141[ ) are reminiscent of those used in Kramer's converse for the 
Gelfand-Pinsker problem (coding for channels with state), see fTT] Sec. F] or [26} Sec. 6.6]. It is 
not clear, as yet, whether there is a deeper relationship between the two problems. 

Appendix B 

Proof of Cardinality Bound {FT]) of Lemma[2] 

Suppose that we have auxiliary random variables (A, B, C) as well as functions <pi and <p2 that satisfy 
the Markov chain ( fT6] l and the average distortion condition ( [19] ), but not the cardinality bounds ( [17] ); i.e., 
the alphabets ^4, B and C are finite but otherwise arbitrary. 

Consider the variable C. For each and every c in the support set of C, let q c denote the conditional 
distribution of (A, B, X) given C = c. Let V\ denote the set of all joint distributions on A x B x X. 

For each and every x in X but one, say x* , define g x : V\ — > [0, 1] by setting 

9x(q) = ^^q(a,b,x). (168) 

aeAbeB 

We notice that, for all x except x* , 

^c{g x {qc)} =F[X = x] (169) 

gives the true marginal distribution of X. Now define the following functionals — each mapping V\ to 
[0, oo] — by setting 

<?!(<?) ^ I(X; B| Y 2 ) - ff(X| A, Y0 (170) 
02((/)4/(X;A|Yi)-.ff(X|B,Y 2 ) (171) 

93(g) -^2 ^2 P^XIXI ^2 q ( a > &> x )p(y^ >2/2k)<5i (£> x ) ( 172 ) 

g4(q) = ^2 mill J2J2 9( a ' & ' x )^(yi'y2k)^2(^,2;), (173) 
beBy 2 ey 2 &e * 2 aeAxeX yi eyi 
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where the joint distribution of (A, B, X, Yi, Y2) in ( |170| ) and ( |171 1 is understood as follows: (A, B, X) is 



distributed according to q and (Yi, Y2) conditionally depends on X via the true side information channel 
(i.e., the conditional distribution P[Yi = y\, Y 2 = y 2 \X = x]); in particular, we have imposed the Markov 
chain (A, B)— o— X— o- (Yi, Y 2 ). We also notice that 

^c{gi(qc)} =I(X;B\Y 2l C)-H(X\A,C,Y 1 ) (174) 
E c {g 2 (qc)} = I(X;A\Y U C) - H(X\B,C,Y 2 ) (175) 
^c{93(qc)}= min IE 5i (X, <j>\(A, C, Yi)) (176) 

^c{gA{qc)}= min E8 2 {X^ 2 {B,C,Y 2 )). {Ill) 

4> 2 :BxCxy 2 -+x 2 

The Support Lemma asserts that there exists a new auxiliary random variable C*t defined on an alphabet 
Ct with cardinality 

|C f |<|Ar|+3 (178) 

together with a collection of \C^\ distributions {qt} from V\ — indexed by the elements c of Ct — such 
that 

Ec{fe(gc)}=E C t{fe(^t)}> Vx £ X except x* (179) 

and 

Ec{#(gc)}=E ot {#foJ,t)}> Vj = 1,2,3,4. (180) 

The new variable C\ the distributions {qt}, and the true side information channel come together via 
the Markov chain 

{A \ b\ ct)— xW (yt, y 2 t) (181) 

to specify a tuple (At, fit, C"t, X\ Y?, Y, 1 ) on A x B x x X x y ± x y 2 . The equality fll79) ensures 
that (Xt^y^) and (JT,Yl,3£) have the same distribution, which also implies 

F(X t |y ] t ) =H(X\Yi) and fl"(Xt|yt) = iT(X|y 2 ). (182) 

Similarly, ( | 1 80 > ensures 



I(X t ;£ t |y 2 t ,C t ) -JT(.X' t |.B t ,C t ,Yf) = I(X;B|y 2 ,C) -H(X\B,C,Y{) (183a) 
/(Xt; At|yJ > ct) - H{tf\A\ C\ Y 2 f ) = /(X; C) - (X|A, C, y 2 ); (183b) 
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and 



min E<5i(X t ,0l(A t ,C t ,Y 1 t )) = min E<5i(X, fa(A, C, Yi)) (184a) 
min E5 2 (X^40 Bt > C ' t > y 2 t )) = min E6 2 (X,^2(B,C,Y 2 )). (184b) 

^SxC+x;^-^ " cj, 2 :BxCxy 2 ^X 2 



Finally, the equalities ( 182 1 and (183) together give 



max Jpft . C t | vt ) + j (x t . | C t , y t ) + /( t ; fit | C t j y t ) 

.7=1,2 J 



max/(X;C|yj-) + I(X;A|C,yi) + J(X;S|C,y 2 ). (185) 

j=l,2 



Consider the tuple {A\ B\C\X\Y{, Yj). We have the Markov chain ( |181| ) by construction, and 



we notice that A^ and £?' always appear separately in (|183[) and (|184|). We may therefore replace the 



joint distribution of {A^ ,B\ Cfl, X\ Y± ,Y^) with another that shares the same Markov chain ( |181| ) and 
marginals (A\C\X^), (B\C\X^) and (X^,y£,Y^), but imposes the new chain 

A 1 " -o- (C^X*) -o-Bt. (1 86 ) 



Or put another way, the Markov chain ( |186| ) does not alter the left hand sides of ( 183 1 or ( 1 84 >. The 



chain ( |186| ) will be important in the sequel because it allows the cardinalities of A and B to be bound 
independently. With a slight abuse of notation, we retain the same notation (A^ ,B\ ,X\ Y± ,Y^) for 
this new distribution. 

Consider the variable A'. For each and every a in the support set of A*, let q a denote the conditional 
distribution of (CT,X') given A) = a. Let Vi denote the set of all joint distributions on C' x X. For 
each and every (c, x) in x X but one, define gc,x 

: V 2 — > [0, 1] by setting 

9c,x{q) = (l{c,x). (187) 

Here E^t {g c ,x (<L4t ) } = P[(CT,X') = (c, x)] returns the desired probability for all (c,x) in x X but 
one. In addition, define 

S8fo) = #(X|C,Yi) (188) 

and 

56(?) - XI X mi ? X X ^( c ' x )p(yi'y2|2;)(5i(x,x), (189) 

where the joint distribution of (C,X, Yi,Y 2 ) is understood as follows: (C,X) is distributed according to 
q, and (Yi,Y 2 ) conditionally depends on X via the true side information channel. We have 

E A ,{g 5 (q A ,)} = H(X1\A\C\Y?). (190) 
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and 

EAt{s6(9At)} = min ES^X^liA^tf ,¥?)). (191) 

The Support Lemma asserts that there exists a random variable defined on an alphabet A* with 
cardinality 

\A*\ < \&\\X\ + 1 (192) 

together with a collection of distributions {qi} from V2 — indexed by the elements a of A* — 
such that 

®At{9cAUt)} =®Ai{9cAlAt)} (193) 

and 

®At{9j(qAt)}=®Ai{9j(qAi)}, J = 5,6. (194) 

The new variable A$, the distributions {qi,}, the true side information channel, the conditional 
distribution P(B'\X', C T ), and the Markov chains (|181|) and (|186[) come together to specify a tuple 



(A*, B*, C*, X*, Y},Y%) onAtxBx&xXxy 1 xy 2 . 

The equalities in ( |193| ) ensure that (C*,X*) and (C^Xt) have the same distribution. By construction, 
we also have that (JB*, C*, X*, Y?,Y%) and (fit ) Ct, X\ y£,Y$) have the same distribution, and therefore 



max 



{/(Xt;Ct|lf) J /(Xt;Ct|^)}+fr(Xt|C*,y*) + /(X*;St|C7t J l^) 

{/(Xt^tlF^^^t^tlF^J + ^XtlCt.F^ + JCXtjBtlCt,^). (195) 



max 



In addition, ( |194| ) ensures that 

ff(x*|A t ,c t ,y 1 t ) = fl'(xt|At J c^,y 1 t ) (196) 

and 

min E6 1 (X i ,</>{(A*,Clt,Y}j)= min E ft (X f , C*, 5^)) . (197) 



Combining (185), ( 184 1, (195 1, (196) and ( 197 » gives 



max {l(X*; C*|Y*), /(X*; C*|lf)} + /(X*; if) + /(X*; £*|C*, if) 

= max {/(X; C|Yi), /(X; C|Y 2 )} +I{X\A\C,Y 1 ) + I(X; B\C,Y 2 ). (198) 

and 

min E5 1 (X i ,(f>\(A i ,C t ,Y})) = min E <5i(X, <pi(A, C, Yij) (199a) 

^utxctxyi^i x <px--AxCxy 1 -+x 1 
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min E5 2 (X t ,4(B t ,C t ,Y$)) = min E 5 2 (X, fa(B, C, Y 2 )) , (19%) 

as desired. 

Using analogous arguments as above, we can find a random vector (A', B', C, X', Y{, Y 2 ') over A* x 
& x x X x yi x 3^2, where the cardinality of the alphabet B' satisfies 

\B'\ < \&\\X\ + 1, (200) 

and such that ( |198| ) and ( |199| ) are satisfied when the tuple (A*, 5*, C*, X*, Yj*, Kjf) is replaced by the 
new tuple (A', B', C , X' , Y/, Y 2 '). This concludes the proof of the cardinality bounds. ■ 

Appendix C 
Proof of LemmaH] 

A. Assertion (i) 

Consider any auxiliary random variable W for which 

W^- (X,L)~^-(Y 1 ,Y 2 ) 

is a Markov chain. We have 

I(W; Y 2 \L) = H(W\L) - H(W\L, Y 2 ) 

®B(W\L)-B(W\L,Y 2 ,Yi) 
> H(W\L) — H(W\L, Yi) 
= I(W;Y l \L), 

where (a) uses the fact that 

W ^-(Y 2 ,L) -*-Y X} 



(201) 

(202) 
(203) 
(204) 
(205) 

(206) 



which follows from ( |201| ), the Markov chain ( |53| ), and the fact that the side information is physically 
degraded. ■ 



B. Assertion (ii) 

Take any auxiliary random variable W for which 

W (X U X 2 ) 

Consider Definition [6] with L = X\. We have 

< /(WjYilXi) 



(Y U Y 2 ). 



(207) 



(208) 
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= H(Y 1 \X 1 )-H(Y 1 \W,X 1 ) (209) 
( => H(Y 1 \X 1 ,X 2 ) - H(Y 1 \W,X 1 ) (210) 
( ^ H(Y 1 \X 1 ,X 2 ) - H(Y 1 \W,X 1 ,X 2 ) (211) 
= I(W;Y 1 \X 1 ,X 2 ) (212) 
= 0, (213) 

where the indicated steps apply the following Markov chains: 

(a) x 2 ^-X 1 ^ Y x 

(b) X 2 ^>-{W,X l )^>-Y l (214) 

(c) W^(X 1 ,X 2 )^(Y 1 ,Y 2 ). 

Thus, we have that 

I{W;Y 1 \X 1 ) = (215) 
and therefore I(W; Yi\Xi) is no larger than I(W;Y 2 \Xi). ■ 

Appendix D 
Proof of Lemma[5] 

Let 

Pe,i = P ^ Xi] (216) 

denote the probability that the i-th symbol Xi = ip(Xi) is reconstructed in error at Receiver 1. The 
probability P e ,i can also be expressed as P t ^ = ¥,5i(Xi, Xi^) and, therefore, we have 



1 n 



e,, < e (217) 

n ' 



from the definition of achievability. Consider the conditional entropy H{X\M,Y\). Starting from the 
fact that X\ is determined by (M, Y\), we have 

H{X\M,Y X ) = H(X\M,Y 1 ,X 1 ) (218) 

< H(X\X X ) (219) 

(b) " . , 

#(^1*1,0 (220) 

(c) n 

< J2(h(P s ,i) + Pe,i l °g\X\) (221) 



(c) 

i=l 
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(d) 

< 



M X>e,i + log |^| 



,i=l 



(e) 



< nh(e) + ne log | | 



(0 



ne{n, e), 



(222) 

(223) 
(224) 



where (a) applies the Markov chain 



X 



{M,Yx)^-X x ; (225) 

(b) invokes the chain rule for entropy and the fact that conditioning cannot increase entropy; (c) applies 
Fano's inequality; (d) combines the concavity of the binary entropy function with Jensen's inequality; 
(e) invokes ( |217[ ); and (f) substitutes 

e(n, e) = h(e) + e log \X\. 
Finally, we notice that e(n, e) — > as e — > 0. 



(226) 



Now consider the rate condition (12i. We have 



R + e > -log 2 LM| 
n 

> —H(M) 
n 

> -H(M\Yt) 

n 

> -I(X,X:M\Yi 

n 

1 



/(X;M|Y 1 ) + I(X;M|X,Yi; 

n \ 

> - (h(X\Yx) - ne(n, e) + I(X; MIX, Yx 
n V 

( = } H{X\Y X ) - e(n, e) + -I(X; M\X, Y x ), 

n 



(227) 
(228) 
(229) 
(230) 
(231) 
(232) 
(233) 



where (a) substitutes ( 224 > and (b) invokes the fact that (X , X ,Yx) is i.i.d. 

Consider the conditional mutual information term on the right hand side of ( 233 1. Rearranging this 
term, with the intent of conditioning on (X, Y2) instead of (X, Yx), we obtain 

I(X; M\X, Yx) = I(X; M\X , Y 2 ) - H(M\X, Y 2 ) + H(M\X, Yx) 

= I(X; M\X, Y 2 ) + I(M; Y 2 \X) - J(M; Y X \X) (234) 

where (a) invokes that M is a function of X or, in the more general case of stochastic encoders, that 

M^-X^-{X,Yx,Y 2 ). (235) 
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Consider the first conditional mutual information on the right hand side of (234]). Expand this term 
using the method of Wyner and Ziv [1, Eqn. (52)] as follows: 

n 

I(X; M\X, Y 2 ) = £ 7pQ; M\X, Y 2 ,X\- 1 ) (236) 
i=i 

n 

® £ M, X{-\X? +1 , Y% +1 , Xt 1 ]^, y 2ii ) (237) 

i=l 
n 

> £ 'PQ; ^ ^I 1 , ^i+il^i, *2,i) (238) 
1=1 

n 

^^2l{Xi;Bi\Xi,Y^, (239) 



i=l 

where (a) follows because (X ,Y 2 , X) i.i.d. and therefore 

H-l 



H(Xi\X, Y 2 ,X\- L ) = H(Xi\Xi, Y 2>i ), (240) 
and in (b) we define 

Bi = (M, y^ 1 , Y" 2 ™ +1 ). (241) 



Continuing on from ( |239[ ), we have 

1 1 n 

-I(X;M\X,Y 2 ) > -Y^IiXi-B^X^i) 



n n 

i=l 

i \ -t n 
(a) 1 



(242) 

> - ^2s(E5 2 (X ij X 2 , l )) (243) 



n 

i=l 



(b) / 1 " . \ 

> S \ E-Y,HX l ,X 2 ,i)\ (244) 
>S(D 2 + e), (245) 

where 

(a) follows from the definition of S(D 2 ) upon noticing that the z-th reconstructed symbol, X 2> i, can be 
expressed as a deterministic function of (Bi,Y 2 j) and 

Bi-^-Xi-^- (Y 1;i ,Y 2>i ); (246) 

(b) combines the convexity of S(D 2 ) in D 2 with Jensen's inequality; and 

(c) S(D 2 ) is non-increasing in D 2 and 

1 n 

Da + e^E-V^pQ,^). (247) 
n 

i=l 
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Consider d233b, (|234b and ((2451. We have 



R + e > # (X|y) - e(n, e) + S(D 2 + e) + - (7(M; F 2 |X) - /(M; 

n V 

We now apply Lemma [T] with 

= X, 5i = Yi, S 2 = Y 2 , T = 0, L = X and J = M. 
There exists jointly distributed with (X, Fi,l2)^)> such that 

W -*-X -*-(Y u Y 2 ) t 

\W\ < \X\, and 

R + e > ff(X|yi) - e(n, e) + 5(£> 2 + e) + /(W; y 2 |X) - I(W; Y X \X). 
The converse proof is completed by letting e — > and invoking the continuity of S{D 2 ) in D 2 

Appendix E 
Proof of Corollary I3.1I 

Choose C = X in Theorem [3] and apply the definition of S{D 2 ) to obtain 

R(0,D 2 ) <H(X\Y 1 ) + S(D 2 ). 
The reverse inequality can be proved using a short converse; specifically, we have 

H{M)>I(X,X,Y t ,Y 2 ;M) 

> I(X; M\Yx) + I{X- M\X, Y U Y 2 ) 



(248) 



(249) 



(250) 



(251) 



(a) 



H{X\Y X ) - H(X\M, Yj.) + I(X; M\X, Y 2 ) 



(b) 



> n H{X\Y!) - e(n, e) + S(D 2 + e) , 



(252) 

(253) 
(254) 
(255) 
(256) 



where (a) applies M—o— (X , Y 2 )— °— Y\ and (b) repeats the steps in ( 224 >, ( 245 1, where e(n, e) can be 
chosen so that e(n, e) — > as e — > 0. ■ 

Appendix F 
Proof of Lemmas 171 and ITTI 

Lemmas [7] and 1 1 are both special cases of the next theorem. 

Theorem 13 (Thm. 1, f^): Let (t/123, U± 2 , U13, U 2 3, U\, U 2 , U3) be any tuple of auxiliary random 
variables, jointly distributed with the source (X, Yi, Y 2 , Y3), such that 
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(i) there is a Markov chain 

{Y 1 ,Y 2 ,Y 3 ) — ^ (C/l23,^12,C/l3,C/23,^l,C/2,C/3); (257) 

(ii) there exist three (deterministic) maps 

0j : Uj x ^ — ► Xj, j = 1, 2, 3, (258a) 

with 

Dj > E Sj (X, 4>j ( Uj ,Yj)). (25 8b) 

Then, for each such tuple of auxiliary random variables, any rate tuple (R±, R 2 , R 3 ) satisfying the 
following inequalities is achievable with distortions (£>i, D 2 , D 3 ): 

Ri > I{X;U 123 ) - IiU^Yt) 

+ I(X- U 12 \U 123 ) - I(U 12 ; Yi|C/ 123 ) 

+ I(X, U 12 ; U 13 \U 123 ) - I(U 13 ; U^Y^U^) 

+ I(X- U x \U l23 , U 12 , U 13 ) - I{U i; Yi|C/ 123 , C/12, C/13) (259a) 

i?i + ^2 > I(X; U 123 ) - min {l(U 123 ; Y 1 ), I(U 123 ; Y 2 )} 

+ I{X- U 12 \U 123 ) - min {/(C/ 12 ; Y 1 \U 123 ), I{U 12 ; Y 2 \U 123 )} 

+ /(X, U 12 ; U 13 \U 123 ) - I(U 13 ; U 12 , Y 1 \U 123 ) 

+ I(X, U 12 , U 13 ; U 23 \U 123 ) - I(U 23 ; C/12, Y 2 \U 123 ) 

+ I{X- U x \U l23 , U 12 , U 13 ) - I(Ur, Y 1 \U 123 , U 12 , U 13 ) 

+ I{X- U 2 \U 123 , U 12 , U 23 ) - I(U 2 - Y 2 \U 123 , U 12 , U 23 ) (25%) 

i?i + R 2 + R 3 > I(X; U 123 ) - min {l(U 123 ; Yi), I(U 123 ; Y 2 ),I(U 123 ; Y 3 )} 

+ I{X- U 12 \U 123 ) - min {I(U 12 ; Y 1 \U 123 ), I(U 12 ; Y 2 \U l23 )} 

+ I(X, U 12 ; U 13 \U 123 ) - min {l(U 13 ; U 12 , Y 1 \U 123 ), I(U 13 ; Y 3 \U 123 )} 

+ I(X, U 12 , U l3 - U 23 \U l23 ) - min {l(U 23 ; U 12 , Y 2 \U 123 ) , I (U 23 ; U 13 , Y 3 \U 123 )} 

+ I{X- [/i|£/i 2 3, C/12, C/13) - /(I7i; niC/123, C/12, C/13) 

+ /(X; U 2 \U 123 , U 12 , U 23 ) - I(U 2 - Y 2 \U 123 , U 12 , U 23 ) 

+ I{X- U 3 \U 123 , C/13, C/23) - /(C/ 3 ; F3IC/123, C/13, C/23). (259c) 
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A. Proof of Lemma [7J 

Suppose that the auxiliary random variables (A\, A 2 , A3) meet the conditions of Lemma [7J Consider 



Theorem 13 with XJ\i and t/13 being constants and 



C/123 = Ui = A x 
U 23 = U 2 = A 2 
U 3 = A 3 . 

The rate constraints of ( |259| ) now simplify to those of Lemma [7J 



(260a) 
(260b) 
(260c) 



B. Proof of Lemma 11 



Suppose that the auxiliary random variables {A\ 2 , A\, A 2 ) meet the conditions of Lemma 11 Consider 



Theorem [13] with infinite D3, set U123, U13, U23 and U3 to be constants, and U\ 2 = A12, U\ = A\ and 



U2 = A2. The rate constraints of (259 1 now simplify to those of Lemma 11 



Appendix G 
Proof of Lemma[9] 



We have 



Ri + e > -H(Mi) 
n 



> -/(XuMiiri) 

n 

(a) I 

> -(HiXx^x) - ne x {n,€)) 
n 

®lf(Xi|yi)-e 1 (n,e), 



(261) 
(262) 
(263) 
(264) 



where (a) applies Fano's inequality in the same way as ( |224| ), where ei(n, e) can be chosen so that 
ei(n, e) — > as e — > 0; and (b) follows because the pair (X±, Y\) is i.i.d. Similarly, we have 



Ri + R 2 + e> -H{M 1 ,M 2 ) 
n 

1 



> -I(X 1 ,X;M 1 ,M 2 \Y 1 ) 
n 

' ' I{X r , Mi, M 2 \Y 1 ) + I(X; Mi, M 2 |Xi, H; 



n 
(*} 1 
n 



/(Xi; Mi, M 2 |Yi) + /(X; M x , M 2 |Xx, V a ) + /(y 2 ; Mi, M 2 |Xi) 



/(Yi;Mi,M 2 |Xi) 



(265) 
(266) 
(267) 

(268) 
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( => - (l(X 1 ;M 1 ,M 2 \Y 1 ) + /(X 2 ; M U M 2 \X X ,Y 2 ) + /(X; M U M 2 \X X ,X 2 ,Y 2 ) 

+ I(Y 2 ;M X ,M 2 \X X ) - I(Yi; Mi, M 2 |Xi)) (269) 

> | Fi ) + H (X 2 1 Xi , Y 2 ) - ei (n, e) - e 2 (n, e) + - (7(X; M 1 ,M 2 \X 1 ,X 2 ,Y 2 ) 

n V 

+ I(Y 2 ;M 1 ,M 2 \X 1 )-I(Y 1 ;M 1 ,M 2 \X 1 )^j (270) 

(d) . . 

> tf(Xi|yi) + i/(X 2 |X 1; Y 2 ) - £l (n, e) - e 2 (n, e) 

+ - (/( Y 2 ; Mi , M 2 1 Xi ) - I(Yi ; Mi , M 2 1 X x )) . (27 1 ) 

The justification for the steps leading to ( |271| ) is: 

(a) the Markov chain (Mi,M 2 )-°- (Xi, JT)-o- (Fi,Y 2 ); 

(b) X 2 is determined by X; 

(c) exploits the fact that (Xi, X 2 , Y x , Y 2 ) is i.i.d. and applies Fano's inequality twice, in a manner 



similar to ( 224 >, where £i(n, e) and e 2 (n, e) can be chosen so that they tend to as e — > 0; and 
(d) the nonnegativity of conditional mutual information. 

We now bound the sum rate R\ + R 2 + R3. Notice that the steps leading to ( |270| ) remain valid if 
we replace R\ + R 2 by R\ + R 2 + i? 3 and the pair of messages (Ml, M 2 ) by the triple (Mi, M 2 , M3). 
Indeed, we have 

Ri + R 2 + R 3 + e > fl"(Xi|n) + i?(X 2 |Xi,y 2 ) - e) - e 2 (n, e) 

+ -(/(X;Mi,M 2 ,M 3 |Xi,X 2 ,Y 2 ) 

+ 7(F 2 ; Mi , M 2 , M 3 \X X ) - I(Y X ; M x , M 2 , M 3 |Xi)) (272) 
( => fl-(Xi|Fi) + H(X 2 \X U Y 2 ) - e x {n, e) - e 2 (n, e) 
+ 1 (l(X;M 1 ,M 2 ,M 3 \X x ,X 2 ,Y 3 ) 

+ /(Mi,M 2 , M 3 ; r 3 |Xi, X 2 ) - /(Mi, M 2 , M 3 ; r 2 |X l5 X 2 ) 
+ J(y 2 ; Mi , M 2 , M 3 |Xi) - I(Y X ; M x , M 2 , M 3 |X X )) (273) 
where (a) invokes the Markov chain 

(Mi, M 2 , M 3 )-o- (Xx, X 2 , X)^- (Y 2 , Y 3 ). (274) 



Consider the first conditional mutual information on the right hand side of (273 1. We have 



-/(XjMi^MalXi^Y'a) > - V J(X i; Mi, M 2 , M 3 , Y^ 1 , Y? i+1 \X Xti , X 2>i , Y 3>i ) (275) 

1=1 

December 12, 2012 DRAFT 



n 



I (Xi ;Ci\Xi t i, X 2) i , Y% 

i=l 

> ^2 S' (ES 3 (Xi, X 3>i )) 

i=l 

(d) / 1 n \ 



(c) 



(e) , 

>S'(D 3 + e), 



where (a) follows from the same reasoning as step (a) of ( |239| ); in (b), we define 

a= {M 1 ,M 2 ,M 3 ,Yi;i 1 ,Y 3 % 1 ); 
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(276) 
(277) 

(278) 
(279) 

(280) 



and (c), (d) and (e) each follow the same reasoning as steps (a), (b) and (c) of ( |245| ) respectively. 
From ((273]) and <|279j) we obtain: 



Ri + R 2 + R 3 + e > ff(Xi|Yi) + H(X 2 \X U Y 2 ) + S'(D 3 + e) + -(l(M 1 ,M 2 , M 3 ; Y 3 \X 1} X 2 



n 

- I(M 1} M 2 , M 3 ; Y 2 \X!, X 2 )) + - (l(M h M 2 , M 3 ; Y 2 \X 1 ) 

- /(Mx, M 2 , M 3 ; - £i(n, e) - e 2 (n, e). 
Consider ( 271 ) and ( 281 ), and apply Lemma [Tj twice: once for 

R = X, S 1 = Y u S 2 = Y 2 , T = Y 3 and L = X u 

and once for 

R = X, Sx = Y 2 , S 2 = Y 3 , T = Yi and L = (X U X 2 ). 
We conclude that there exist auxiliary random variables Wi, W 2 and W 3 with 

|WiUW 2 UW 3 | < \X\, 

and 

Wj^-X (Y!,Y 2 ,Y 3 ), j = 1,2,3, 
such that the rate tuple (R±, R 2: R 3 ) satisfies 

Ri + R 2 + e> H^X^) + H(X 2 \X 1} Y 2 ) + I^Y^Xt) - IiW^YtlXt) 



(281) 



(282) 



(283) 



(284) 



(285) 



£i(n,e) -e 2 {n,e) (286) 
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and 

R l + R 2 + R 3 + e> H(Xi\Yi) + H(X 2 \X h Y 2 ) + S'(D 3 + e) - e 2 (n, e) - £l (n, e) 

+ 7(W3;r 3 |^i,^2)-i"(W3;^2|^i ) ^2) + /(W 2 ;y 2 |X 1 )-/(W 2 ;y 1 |X 1 ). (287) 



The converse proof follows by ( |264| >, ( |286| ), and ( 2871 ), by letting e — > 0, and by the continuity of S'(Ds) 
in L> 3 . ■ 

Appendix H 
Proofs of Theorem [T2l 

A. Assertion (i) 

Achievability: The rate constraints ( |121[ ) reduce to ( |125| ) upon setting Ai = X\ and v4i 2 = ^2 = ^2 
and invoking the assumptions X 2 = ifi'(Xi) and //(.X^Yi) < H(X 2 \Y 2 ). 

Converse: The lower bound on R\ in ( |125a| ) is trivial. The lower bound on the sum rate i?i + R 2 



in ( 125b) follows by, now familiar, arguments: 



Ri + R 2 + e> -H(MuM 2 ) (288) 
n 

>-I{X,X 2 ;M u M 2 \Y 2 ) (289) 
n 

= -(l(X 2] Mi,M 2 \Y 2 ) + I(X;M U M 2 \X 2 ,Y 2 )) (290) 

= -(l(X 2 ;M 1 ,M 2 \Y 2 )+I{X;M u M 2 \X 2 ,Y 1 ) 

n \ 

+ I(M 1 ,M 2] Y 1 \X 2 )-I(M 1 ,M 2] Y 2 \X 2 )} (291) 

> H(X 2 \Y 2 ) + H{Xi\X 2 ,Y x ) - e(n, e) 

+ ^(j(M 1 ,M 2 ;Y 1 \X 2 )-I(M 1 ,M 2 ;Y 2 \X 2 fj (292) 
= H (X 2 \Y 2 ) + H(Xi|X 2 , Yi) - e(n, e) + 7(W; Fi|X 2 ) - Y 2 |X 2 ) (293) 

> H(X 2 \Y 2 ) + //(XxlXa^i) - e(n,e), (294) 

where (a) applies Fano's inequality and that can be computed as a function of X and e(n, e) — > as 
e — ► 0; (b) uses Lemma [IJ and (c) invokes the assumption (Y± >zY 2 \ X 2 ). ■ 

B. Assertion (ii) 

Achievability: The rate constraints ( 121 ) reduce to ( |127[ ) upon setting t4i 2 = X\, A 2 = X 2 and A\ = 



constant and invoking the assumptions X\ = ip'(X 2 ) and H(X\\Yi) < H{X\\Y 2 ). 
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Converse: The converse holds because for j = 1, 2, we have Rj > H(Xj\Yj) > 0. ■ 
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